##%%%%%%%%%%%%%%%%%%%%%%%%%% README %%%%%%%%%%%%%%%%%%%%%%%%%%

This repository contains the following R scripts:

FUNC.R: A collection of helper functions used in the other scripts.

SIMN.R: An R script that takes the original data files and produces 
simulation data sets using GenOrd as well as computing statistics on 
the original model specifications. 

InOutRsq.R: A script for doing model fits and computing statistics such 
as fstat, out.fstat, out.tstatF and the cross validation versions of these 
for each variable of each model in each specification for all for data set 
sizes. In the subdirectory "Script Files", a number of batch scripts are
contained to run these in batch mode. NOTE: These scripts will need
to be modified to the local environment of the user to run properly and
running all of these even in batch mode will take some time (day/days).

MERGE.R: A script to merge the files produced by InOutRsq.R and produce
the files rsqAA, rsqMedAA, rsqInpMedAA that are used in the analysis section
in ANAL.R. If one doesn't want to run the computationally intensive portions
done by InOutRsq.R (or their batch versions), versions of rsqAA, rsqMedAA, 
rsqInpMedAA have been supplied and one can skip running InOutRsq.R, Merge.R.

ANAL.R: A script that produces the analysis tables on the simulation data 
presented in the paper.

ANALdeming.R: A script that produces the reanalysis of the Deming 2009 paper.

NOTE: All "save" or "write" actions in these scripts are commented out 
(have "#" prefix) so to save these computed files one will need to remove
the comment prefix. To run these scripts, one will need to establish a 
working directory, put all the files from this repository in that directory
then modify the scripts to set the working directory to the one being
used.

Raw Deming data used in ANALdeming.R is found in two Stata files:
NoncogStd.dta
regressorAndTargetsN.dta

The original data is contained in six data files:
tins2: t1+t2+xx2 
tins3: t1+t2+xx3 
tins4: t1+t2+xx4 
tins4S: t1+t2+xx4S \
tins7: t1+t2+xx7 (Note: Here t1 and t2 are the cluster centered versions). 
tins7S: t1+t2+xx7S (Note: Here t1 and t2 are the cluster centered versions).

Precomputed model fits with fstat's, tstats, cv.fstat, out.fstat, etc:
rsqAA: Model and ind.var statistics for all variables/sims/data set sizes.
rsqInpMedAA: Means and medians for all model and ind.var statistics 
             across all 200 models.
rsqMedAA: The data from rsqInpMedAA aligned with rsqAA data used for
             statistical comparisons.


